Estimating a Kernel Fisher Discriminant in the Presence of Label Noise
نویسندگان
چکیده
Data noise is present in many machine learning problems domains, some of these are well studied but others have received less attention. In this paper we propose an algorithm for constructing a kernel Fisher discriminant (KFD) from training examples with noisy labels. The approach allows to associate with each example a probability of the label being flipped. We utilise an expectation maximization (EM) algorithm for updating the probabilities. The E-step uses class conditional probabilities estimated as a by-product of the KFD algorithm. The M-step updates the flip probabilities and determines the parameters of the discriminant. We demonstrate the feasibility of the approach on two real-world data-sets.
منابع مشابه
Learning kernel logistic regression in the presence of class label noise
The classical machinery of supervised learning machines relies on a correct set of training labels. Unfortunately, there is no guarantee that all of the labels are correct. Labelling errors are increasingly noticeable in today’s classification tasks, as the scale and difficulty of these tasks increases so much that perfect label assignment becomes nearly impossible. Several algorithms have been...
متن کاملConvex Multiview Fisher Discriminant Analysis
CCA can be seen as a multiview extension of PCA, in which information from two sources is used for learning by finding a subspace in which the two views are most correlated. However PCA, and by extension CCA, does not use label information. Fisher Discriminant Analysis uses label information to find informative projections, which can be more informative in supervised learning settings. We deriv...
متن کاملProbabilistic Fisher discriminant analysis
Fisher Discriminant Analysis (FDA) is a powerful and popular method for dimensionality reduction and classification which has unfortunately poor performances in the cases of label noise and sparse labeled data. To overcome these limitations, we propose a probabilistic framework for FDA and extend it to the semi-supervised case. Experiments on realworld datasets show that the proposed approach w...
متن کاملClassification in the Presence of Class Noise
Abstract In machine learning, class noise occurs frequently and deteriorates the classifier derived from the noisy dataset. This paper presents several possible solutions to this problem based on LSA, a probabilistic noise model proposed by Lawrence and Schölkopf (2001). These solutions include the Clustering-based Probabilistic Algorithm (CPA), the Probabilistic Fisher (PF), and the Probabilis...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001